{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a58575c6-59c4-4289-869f-f5a1ac7e021c",
   "metadata": {},
   "source": [
    "# Search Within Videos with Text\n",
    "\n",
    "## Introduction\n",
    "This notebook guides you through the process of searching for specific textual information within videos and retrieving relevant video segments. To achieve this, we leverage various libraries and techniques, including:\n",
    "* clip: A library for vision and language understanding.\n",
    "* PIL: Python Imaging Library for image processing.\n",
    "* torch: The PyTorch library for deep learning.\n",
    "\n",
    "\n",
    "Searching within videos with text has practical applications in various domains:\n",
    "\n",
    "1. **Video Indexing:** People can find specific topics within videos, enhancing search experiences.\n",
    "\n",
    "2. **Content Moderation:** Social media platforms use text-based searches to identify and moderate content violations.\n",
    "\n",
    "3. **Content Discovery:** Users search for specific scenes or moments within movies or TV shows using text queries. Security personnel can search within video footage for specific incidents or individuals. \n",
    "\n",
    "Your imagination is your limit. Basically, all this example is doing is making the video like a blog post and searchable as well!\n",
    "\n",
    "Here is the example. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6eec562900dd0cff",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Prerequisites\n",
    "\n",
    "Before diving into the implementation, ensure that you have the necessary libraries installed by running the following commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab56c57e-fa04-43dd-9670-ade9b5c6d4ac",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install superduperdb\n",
    "!pip install ipython opencv-python pillow openai-clip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f559fff0-df68-473a-94a2-afe39e4d5577",
   "metadata": {},
   "source": [
    "## Connect to datastore \n",
    "\n",
    "First, we need to establish a connection to a MongoDB datastore via SuperDuperDB. You can configure the `MongoDB_URI` based on your specific setup. \n",
    "Here are some examples of MongoDB URIs:\n",
    "\n",
    "* For testing (default connection): `mongomock://test`\n",
    "* Local MongoDB instance: `mongodb://localhost:27017`\n",
    "* MongoDB with authentication: `mongodb://superduper:superduper@mongodb:27017/documents`\n",
    "* MongoDB Atlas: `mongodb+srv://<username>:<password>@<atlas_cluster>/<database>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99de0e3d-8918-4fc4-a45b-0a58b70793c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superduperdb import superduper, CFG\n",
    "from superduperdb.backends.mongodb import Collection\n",
    "import os\n",
    "\n",
    "# Use hybrid storage\n",
    "CFG.force_set('downloads_folder', './data')\n",
    "\n",
    "# Define the MongoDB URI, with a default value if not provided\n",
    "mongodb_uri = os.getenv(\"MONGODB_URI\", \"mongomock://test\")\n",
    "\n",
    "# SuperDuperDB, now handles your MongoDB database\n",
    "# It just super dupers your database by initializing a SuperDuperDB datalayer instance with a MongoDB backend and filesystem-based artifact store\n",
    "db = superduper(mongodb_uri, artifact_store='filesystem://./data/', downloads_folder='./data')\n",
    "\n",
    "# Create a collection named 'videos'\n",
    "video_collection = Collection('videos')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e53ce4113115246",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Load Dataset\n",
    "\n",
    "We'll begin by configuring a video encoder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebac4921-5c83-4ba7-b793-67f5f90d42ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superduperdb import Encoder\n",
    "\n",
    "# Create an instance of the Encoder with the identifier 'video_on_file' and load_hybrid set to False\n",
    "vid_enc = Encoder(\n",
    "    identifier='video_on_file',\n",
    "    load_hybrid=False,\n",
    ")\n",
    "\n",
    "# Add the Encoder instance to the SuperDuperDB instance\n",
    "db.add(vid_enc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf1cef0e-21ac-4291-b2c8-41065717ee67",
   "metadata": {},
   "source": [
    "Let's fetch a sample video from the internet and insert it into our collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee6335cb-960d-4239-be6e-501d52b88026",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superduperdb.base.document import Document\n",
    "\n",
    "# Insert a video document into the 'videos' collection\n",
    "db.execute(\n",
    "    video_collection.insert_one(\n",
    "        Document({'video': vid_enc(uri='https://superduperdb-public.s3.eu-west-1.amazonaws.com/animals_excerpt.mp4')}) # Encodes the video\n",
    "    )\n",
    ")\n",
    "\n",
    "# Display the list of videos in the 'videos' collection\n",
    "list(db.execute(Collection('videos').find()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f6bf3fa-2a2b-44c4-8a95-328a515c90c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "db.execute(video_collection.find_one())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "441fe6d6a9dee06b",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Register Encoders\n",
    "\n",
    "Now, let's set up encoders to process videos and extract frames. These encoders will assist in converting videos into individual frames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2af2d178-9ff2-496d-8293-e5aee3f12a19",
   "metadata": {},
   "outputs": [],
   "source": [
    "import cv2\n",
    "import tqdm\n",
    "from PIL import Image\n",
    "from superduperdb.ext.pillow import pil_image\n",
    "from superduperdb import Model, Schema\n",
    "\n",
    "\n",
    "# Define a function to convert a video file into a list of images\n",
    "def video2images(video_file):\n",
    "    # Set the sampling frequency for frames\n",
    "    sample_freq = 10\n",
    "    \n",
    "    # Open the video file using OpenCV\n",
    "    cap = cv2.VideoCapture(video_file)\n",
    "    \n",
    "    # Initialize variables\n",
    "    frame_count = 0\n",
    "    fps = cap.get(cv2.CAP_PROP_FPS)\n",
    "    extracted_frames = []\n",
    "    progress = tqdm.tqdm()\n",
    "\n",
    "    # Iterate through video frames\n",
    "    while True:\n",
    "        ret, frame = cap.read()\n",
    "        if not ret:\n",
    "            break\n",
    "        \n",
    "        # Get the current timestamp based on frame count and FPS\n",
    "        current_timestamp = frame_count // fps\n",
    "        \n",
    "        # Sample frames based on the specified frequency\n",
    "        if frame_count % sample_freq == 0:\n",
    "            extracted_frames.append({\n",
    "                'image': Image.fromarray(frame[:,:,::-1]),  # Convert BGR to RGB\n",
    "                'current_timestamp': current_timestamp,\n",
    "            })\n",
    "        frame_count += 1\n",
    "        progress.update(1)\n",
    "    \n",
    "    # Release resources\n",
    "    cap.release()\n",
    "    cv2.destroyAllWindows()\n",
    "    \n",
    "    # Return the list of extracted frames\n",
    "    return extracted_frames\n",
    "\n",
    "# Create a SuperDuperDB model for the video2images function\n",
    "video2images_model = Model(\n",
    "    identifier='video2images',\n",
    "    object=video2images,\n",
    "    flatten=True,\n",
    "    model_update_kwargs={'document_embedded': False},\n",
    "    output_schema=Schema(identifier='myschema', fields={'image': pil_image})\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19a28dbe-dcec-4c6b-bd2d-72dbd48daf39",
   "metadata": {},
   "source": [
    "Additionally, we'll configure a listener to continually download video URLs and store the best frames in another collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a30d093b-03d3-4bdb-aa8b-46ff974d1995",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superduperdb import Listener\n",
    "\n",
    "# Add a listener to process videos using the video2images model\n",
    "db.add(\n",
    "   Listener(\n",
    "       model=video2images_model,  # Assuming video2images is your SuperDuperDB model\n",
    "       select=video_collection.find(),\n",
    "       key='video',\n",
    "   )\n",
    ")\n",
    "\n",
    "# Get the unpacked outputs of the video2images process for a specific video\n",
    "outputs = db.execute(Collection('_outputs.video.video2images').find_one()).unpack()\n",
    "\n",
    "# Display the image output from the processed video\n",
    "image_output = outputs['_outputs']['video']['video2images']['0']['image']\n",
    "\n",
    "image_output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ef3c353-fcc4-4f23-892b-c8a3796f952c",
   "metadata": {},
   "source": [
    "## Create CLIP Model\n",
    "\n",
    "Now, let's establish a model for CLIP (Contrastive Language-Image Pre-training), serving for both visual and textual analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd7329ea-75d1-4275-b754-1a977e76161a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import clip\n",
    "from superduperdb import vector\n",
    "from superduperdb.ext.torch import TorchModel\n",
    "\n",
    "# Load the CLIP model and define a tensor type\n",
    "model, preprocess = clip.load(\"RN50\", device='cpu')\n",
    "t = vector(shape=(1024,))\n",
    "\n",
    "# Create a TorchModel for visual encoding\n",
    "visual_model = TorchModel(\n",
    "    identifier='clip_image',\n",
    "    preprocess=preprocess,\n",
    "    object=model.visual,\n",
    "    encoder=t,\n",
    "    postprocess=lambda x: x.tolist(),\n",
    ")\n",
    "\n",
    "# Create a TorchModel for text encoding\n",
    "text_model = TorchModel(\n",
    "    identifier='clip_text',\n",
    "    object=model,\n",
    "    preprocess=lambda x: clip.tokenize(x)[0],\n",
    "    forward_method='encode_text',\n",
    "    encoder=t,\n",
    "    device='cpu',  # Specify the device for text encoding\n",
    "    preferred_devices=None,  # Specify preferred devices for model execution\n",
    "    postprocess=lambda x: x.tolist(),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfa470d9-35d0-4c53-a5d0-afba5456320a",
   "metadata": {},
   "source": [
    "## Create VectorIndex\n",
    "\n",
    "We'll now establish a VectorIndex to index and search the video frames based on both visual and textual content. This includes creating an indexing listener for visual data and a compatible listener for textual data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "475d27c7-81e0-47ae-a02b-b1df4332002c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from superduperdb import Listener, VectorIndex\n",
    "from superduperdb.backends.mongodb import Collection\n",
    "\n",
    "# Add a VectorIndex for video search\n",
    "db.add(\n",
    "    VectorIndex(\n",
    "        identifier='video_search_index',\n",
    "        indexing_listener=Listener(\n",
    "            model=visual_model, # Visual model for image processing\n",
    "            key='_outputs.video.video2images.0.image', # Visual model for image processing\n",
    "            select=Collection('_outputs.video.video2images').find(), # Collection containing video image data\n",
    "        ),\n",
    "        compatible_listener=Listener(\n",
    "            model=text_model,  # Text model for processing associated text data\n",
    "            key='text',\n",
    "            select=None,\n",
    "            active=False\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95c48d0c-4f7a-4c32-a2e3-3f8d8985733a",
   "metadata": {},
   "source": [
    "## Query Text Against Saved Frames\n",
    "\n",
    "Now, let's search for something that happened during the video:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31ba463f-97ae-4f83-890e-c852f9818e63",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the search parameters\n",
    "search_term = 'Some ducks'\n",
    "num_results = 1\n",
    "\n",
    "# Execute the search and get the next result\n",
    "r = next(db.execute(\n",
    "    Collection('_outputs.video.video2images')\n",
    "    .like(Document({'text': search_term}), vector_index='video_search_index', n=num_results)\n",
    "    .find()\n",
    "))\n",
    "\n",
    "# Extract the timestamp from the search result\n",
    "search_timestamp = r['_outputs']['video']['video2images']['0']['current_timestamp']\n",
    "\n",
    "# Retrieve the back reference to the original video using the '_source' field\n",
    "video = db.execute(Collection('videos').find_one({'_id': r['_source']}))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78fc11ff-dafc-4525-88a5-327ed547b89e",
   "metadata": {},
   "source": [
    "## Start the Video from the Resultant Timestamp\n",
    "\n",
    "Finally, we can display and play the video starting from the timestamp where the searched text is found."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eeda6711-15a4-465e-903d-ed0a1d0db672",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import display, HTML\n",
    "\n",
    "# Create HTML code for the video player with a specified source and controls\n",
    "video_html = f\"\"\"\n",
    "<video width=\"640\" height=\"480\" controls>\n",
    "    <source src=\"{video['video'].uri}\" type=\"video/mp4\">\n",
    "</video>\n",
    "<script>\n",
    "    // Get the video element\n",
    "    var video = document.querySelector('video');\n",
    "    \n",
    "    // Set the current time of the video to the specified timestamp\n",
    "    video.currentTime = {search_timestamp};\n",
    "    \n",
    "    // Play the video automatically\n",
    "    video.play();\n",
    "</script>\n",
    "\"\"\"\n",
    "\n",
    "# Display the HTML code in the notebook\n",
    "display(HTML(video_html))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
